Multitemporal Cloud Masking

The input to our algorithm will be a landsat L8 image together with a region of interest where we want to: (1) detect the clouds and (2) predict the background.

In [1]:
import ee

from datetime import datetime
from IPython.display import Image
from IPython.display import display

from ee_ipl_uv import multitemporal_cloud_masking
from ee_ipl_uv import download
from ee_ipl_uv import time_series_operations
from ee_ipl_uv import time_series_show
from ee_ipl_uv import predefined_cloud_algorithms


ee.Initialize()

# Select image to remove clouds
image_predict_clouds = ee.Image('LANDSAT/LC8_L1T_TOA_FMASK/LC81990332015174LGN00')
bands_original = image_predict_clouds.bandNames().getInfo()

# Select region of interest
pol = [[-0.50262451171875,39.39269330108945],
 [-0.267791748046875,39.38526381099777],
 [-0.26092529296875,39.54005788576377],
 [-0.501251220703125,39.53793974517628],
 [-0.50262451171875,39.39269330108945]]

region_of_interest = ee.Geometry.Polygon(pol)
In [2]:
# hide_me
class Caption():
    def __init__(self,s):
        self.s = s
    def _repr_html_(self):
        return '<center>{0}</center>'.format(self.s)
    def _repr_latex_(self):
        return '\\begin{center}\n'+self.s+'\n\\end{center}'
In [3]:
    
# Show image
image_file_original = download.MaybeDownloadThumb(image_predict_clouds.clip(region_of_interest),
                                                  params={"max":.3,"bands":"B4,B3,B2"})

display(Image(image_file_original),Caption('Clipped landsat 8 TOA image over region of interest'))
Clipped landsat 8 TOA image over region of interest

Search for previous images within the collection without clouds

In order to build the model we will need past images of the current image of interest $X_t$. $X_t \in \mathbb{R}(p \times B)$, $p \equiv \text{number of pixels}$, $B \equiv \text{number of bands}$.

Threfore we want to construct $Y_t = [X_t,X_{t-1},X_{t-2},X_{t-3}]$, $Y_t \in \mathbb{R}(p \times 4 \cdot B)$ Where we define $X_{t-j}$ being the $j$th previous image to $t$ without clouds.

SelectImagesTraining performs this operation selecting the num_images most recent images without clouds within the current landsat image collection. It returns the current image together with the bands of the previous images. We have used GEE ee.Algorithms.Landsat.simpleCloudScore algorithm as criteria to predetect clouds.

In [4]:
# Select max_lags scenes previous to the selected images and stack all those images together with this on bands
max_lags = 3 
image_with_lags = multitemporal_cloud_masking.SelectImagesTraining(image_predict_clouds,region_of_interest,
                                                                   num_images=max_lags)

Show selected images and its dates

image_with_lags is an image with the same bands as our original image together with its lags.

In [5]:
# Print band names
print("Bands: ",", ".join(image_with_lags.bandNames().getInfo()))
('Bands: ', u'B1, B2, B3, B4, B5, B6, B7, B8, B9, B10, B11, BQA, fmask, B1_lag_1, B2_lag_1, B3_lag_1, B4_lag_1, B5_lag_1, B6_lag_1, B7_lag_1, B8_lag_1, B9_lag_1, B10_lag_1, B11_lag_1, BQA_lag_1, fmask_lag_1, B1_lag_2, B2_lag_2, B3_lag_2, B4_lag_2, B5_lag_2, B6_lag_2, B7_lag_2, B8_lag_2, B9_lag_2, B10_lag_2, B11_lag_2, BQA_lag_2, fmask_lag_2, B1_lag_3, B2_lag_3, B3_lag_3, B4_lag_3, B5_lag_3, B6_lag_3, B7_lag_3, B8_lag_3, B9_lag_3, B10_lag_3, B11_lag_3, BQA_lag_3, fmask_lag_3')

image_with_lags has in its metadata properties the timestamps of the used lagged images. We will print them to show how can we access:

In [6]:
# Print time stamps 
dictionary_server = ee.Dictionary()
for lag in range(1,max_lags+1):
    prop = "system:time_start_lag_"+str(lag)
    dictionary_server = ee.Dictionary(dictionary_server).set(prop,
                                                             image_with_lags.get(prop))

dict_local = dictionary_server.getInfo()
for k in dict_local: 
    dict_local[k] = datetime.utcfromtimestamp(dict_local[k]/1000)

print(dict_local)
{u'system:time_start_lag_2': datetime.datetime(2015, 5, 31, 10, 36, 32), u'system:time_start_lag_3': datetime.datetime(2015, 5, 22, 10, 42, 40), u'system:time_start_lag_1': datetime.datetime(2015, 6, 7, 10, 42, 49)}

Finally we will show the image (left) together with its chosen predecesors sorted (more left more recent less lag).

In [7]:
# Show "image": lag_3,lag_2,lag_1 and current image:
img_file = time_series_show.ShowLaggedThumbs(image_with_lags.clip(region_of_interest),
                                             range(0,max_lags+1),
                                             bands=["B4","B3","B2"],
                                             params={"max":.3})

display(Image(img_file),Caption('Image (right) with previous images  with lags: '+
                                'lag_1 (second right), lag_2 (second left), lag_3 (left)'))
Image (right) with previous images with lags: lag_1 (second right), lag_2 (second left), lag_3 (left)

Use precomputed cloud mask for Estimation

The model as it is described in [], requires a precomputed cloud mask to improve performance using pixels from the current image (estimation). We will use QABand of landsat which has a cloud mask defined on it.

In [8]:
clouds_original = predefined_cloud_algorithms.QACloudMask(image_predict_clouds,strict=False)
#clouds_original = image_predict_clouds.select("fmask").eq(2)
#clouds_original = clouds_original.where(image_predict_clouds.select("fmask").eq(4),2)

# Add growing to the mask

clouds = clouds_original.reduceNeighborhood(ee.Reducer.max(),
                                            ee.Kernel.circle(radius=3))

clouds_file = download.MaybeDownloadThumb(clouds.clip(region_of_interest),
                                          params={'max':2,'min':0,
                                                  'palette':','.join(['000000','FF0000','0000FF'])})
clouds_file_original = download.MaybeDownloadThumb(clouds_original.clip(region_of_interest),
                                                   params={'max':2,'min':0,
  'palette':','.join(['000000','FF0000','0000FF'])})

name_composite_images = download.MosaicImageList([clouds_file_original,clouds_file,image_file_original],
                                                 [1,3])

display(Image(name_composite_images),
        Caption('QABand cloud mask (left) 3-pixel grown QABand cloud mask (center) original image (right)'))
QABand cloud mask (left) 3-pixel grown QABand cloud mask (center) original image (right)

Build the model

Given the image with its lags $Y_t$, we will form 2 data sets:

  • Collection for prediction $P$: will be formed by pixels $z(p)\in \mathbb{R}(3\cdot B)$ from $[X_{t-1},X_{t-2},X_{t-3}]$. Those pixels will be divided on input: $x(p)\in \mathbb{R}(2\cdot B)$ from $P_x:= [X_{t-2},X_{t-3}]$ and output $y(p)\in \mathbb{R}(B)$ from $P_y:=X_{t-1}$.
  • Collection for estimation $E$: will be formed by cloud free pixels $z(p)\in \mathbb{R}(3\cdot B)$ from $[X_{t},X_{t-1},X_{t-2}]$. Those pixels will be divided on input: $x(p)\in \mathbb{R}(2\cdot B)$ from $E_x:=[X_{t-1},X_{t-2}]$ and output $y(p)\in \mathbb{R}(B)$ from $E_y:=X_{t}$.

The model we want to optimize is: $$ \mathcal{L}(W) = \frac{\beta}{n}||(P_xW-P_y)||^2 + \frac{1-\beta}{m}||E_xW-E_y||^2 + \lambda||W||^2 $$ Which can be written as: $$ \mathcal{L}(W) = \left \| diag\begin{pmatrix} \frac{\beta}{n} \\ \frac{1-\beta}{m} \end{pmatrix}^{1/2} \begin{pmatrix} P_x \\ E_x \end{pmatrix} W - diag\begin{pmatrix} \frac{\beta}{n} \\ \frac{1-\beta}{m} \end{pmatrix}^{1/2} \begin{pmatrix} P_y \\ E_y \end{pmatrix} \right \|^2+ \lambda||W||^2 $$

ModelCloudMasking class implements the datasets for the model as it is described. It takes as inputs the image with its lags ($Y_t$), the bands used in the model and the cloud mask for estimation. It builds the estimation and prediction dataset. Finally it has 4 main functions: TrainLinearServer, TrainRBFKernelServer, PredictLinear and PredictRBFKernel to train and predict the proposed models.

We will show now the images used for prediction (top) and the ones used for estimation (bottom). They correspond to lag_3,lag_2, lag_1 for prediction and lag_2, lag_1, original_img for estimation.

lag_3,lag_2 will form $E_x$ dataset, lag_1 $E_y$, lag_2, lag_1 $P_x$ and original_img $P_y$.

In [9]:
bands_model = ["B1","B2", "B3", "B4", "B5", "B6", "B7","B8","B9","B10", "B11"]
modelo = multitemporal_cloud_masking.ModelCloudMasking(image_with_lags,bands_model,
                                                       clouds,max_lags,region_of_interest)

img_file_pred = time_series_show.ShowLaggedThumbs(image_with_lags.clip(region_of_interest),
                                                 range(1,max_lags+1),
                                                 bands=["B4","B3","B2"],
                                                 params={"max":.3})
img_file_est = time_series_show.ShowLaggedThumbs(modelo.img_est.clip(region_of_interest),
                                                 range(0,max_lags),
                                                 bands=["B4","B3","B2"],
                                                 params={"max":.3})

img_files_pred_est = download.MosaicImageList([img_file_pred,img_file_est],
                                              [2,1])
display(Image(img_files_pred_est),
        Caption('Prediction images (3 top: lag_3, lag_2, lag_1 from left to right).'+
                ' (3 bottom: lag_2, lag_1, original from left to right)'))
Prediction images (3 top: lag_3, lag_2, lag_1 from left to right). (3 bottom: lag_2, lag_1, original from left to right)

Kernel Ridge Regression Model

In [15]:
lmbda_kernel = 0.001
gamma_kernel = 0.01
sf_kernel = 0.01

modelo.TrainRBFKernelServer(sampling_factor=sf_kernel,lmbda=lmbda_kernel,gamma=gamma_kernel)

img_forecast_kernel = modelo.PredictRBFKernel()

display(Image("batch_images/kernel_forecast.png"),
        Caption('True image (left) vs RBF Kernelized Ridge Regression forecast image (right)'))
True image (left) vs RBF Kernelized Ridge Regression forecast image (right)

Note: The image displayed above has been downloaded previously. This is because the current Earth Engine does not allow computationally expensive invocations on interactive mode. The above invocation can only be executed as an export task: batch mode. We have downloaded the image and preprocessed to show in the current notebook.

Linear Ridge Model

Hereby we will train the linear ridge regression model and apply it to the current image having now a forecasted image without clouds.

In [14]:
# Linear model:
modelo.TrainLinearServer(sampling_factor=.1,lmbda=1e-6)

img_forecast_linear = modelo.PredictLinear()

display(Image("batch_images/linear_forecast.png"),
        Caption('True image (left) vs Linear Ridge Regression forecast image (right)'))
True image (left) vs Linear Ridge Regression forecast image (right)

Cloud Mask

With the forecasted cloud-free image we will compute useful features for cloud detection. Afterwards we will perform a simple k-means clustering over the difference of such features.

Download the image

Given that GEE does not have clustering algorithms implemented, we have to download the image to perform the clustering. This is also necessary because image forecasting task is too expensive to be executed in interactive mode.

In [16]:
img_forecast = img_forecast_kernel

# Compute usefull features for cloud detection on forecasted image and original image
img_forecast_features, bnames_features_forecast = multitemporal_cloud_masking.ComputeFeatures(
    img_forecast, "_forecast")

features_img_original, bnames_features = multitemporal_cloud_masking.ComputeFeatures(
        image_predict_clouds)

# Concat image to download.
img_forecast = img_forecast.addBands(img_forecast_features)
bands_forecast.extend(bnames_features_forecast)
image_predict_clouds_with_features = image_predict_clouds.addBands(features_img_original)
bands_forecast.extend(bands_original)
img_forecast = img_forecast.addBands(image_predict_clouds_with_features.double())
bands_forecast.extend(bnames_features)

# Download the image 
# (uncomment to run)
# task = ee.batch.Export.image.toDrive(img_forecast,
#                                         description="kernel_1",
#                                         folder="folder_on_your_google_drive_associated_account")
# task.start()
# Save info for reading the geotif
# info["gamma"] = gamma_kernel
# info["lambda"] = lmbda_kernel
# info["sampling_factor"] = sf_kernel
# info["model"] = ="kernel_1"
# info["bands"] = bands_forecast
# # Save the info file
# json_file = 'batch_images/kernel_1/info.json'
# with open(json_file, 'w') as outfile:
#        json.dump(info, outfile)
# 
# task.status()

Compute Cloud Mask

Once the data is downloaded as a geotif we can do the clustering locally.

First load the geotif as numpy arrays:

In [23]:
from osgeo import gdal
from ee_ipl_uv import clustering
import numpy as np
import json

model_name = "kernel_1"
path="batch_images/"+model_name+"/"

# Load band names
with open(path+"info.json","r") as infile:
    info = json.load(infile)

# Load geotif
ds = gdal.Open(path+model_name+".tif")

# Compute band differences on cloud_masking.FEATURES_CLOUDS feature space
lista_bandas = list(multitemporal_cloud_masking.FEATURES_CLOUDS)
bands_model = ["B1","B2", "B3", "B4", "B5", "B6", "B7","B8","B9","B10", "B11"]
lista_bandas.extend(bands_model)
lista_valores = []
for banda in lista_bandas:
    banda_forecast = banda + "_forecast"
    i_banda = info["bands"][banda] + 1 # gdal is 1-based array!!!
    i_banda_forecast = info["bands"][banda_forecast] + 1 
    d2_array = ds.GetRasterBand(i_banda).ReadAsArray()-ds.GetRasterBand(i_banda_forecast).ReadAsArray()
    lista_valores.append(d2_array)

differences = np.dstack(tuple(lista_valores))
differences_features = np.dstack(tuple(lista_valores[:len(multitemporal_cloud_masking.FEATURES_CLOUDS)]))

lista_valores = []
for banda in ["B1","B2", "B3", "B4", "B5", "B6"]:
    i_banda = info["bands"][banda] + 1 # gdal is 1-based array!!!
    d2_array = ds.GetRasterBand(i_banda).ReadAsArray()
    lista_valores.append(d2_array)
radiancia_original = np.dstack(tuple(lista_valores))

lista_valores = None

Try different cluster configurations.

Using only diferences in feature space or also in feature and radiance space. Clusters are ordered by radiance intensity (more brighter clusters high cluster number):

In [20]:
mascara_3 = clustering.ClusterClouds(differences,radiancia_original,n_clusters=3)
mascara_5 = clustering.ClusterClouds(differences,radiancia_original,n_clusters=5)
mascara_3_features = clustering.ClusterClouds(differences_features,radiancia_original,n_clusters=3)
mascara_5_features = clustering.ClusterClouds(differences_features,radiancia_original,n_clusters=5)
In [21]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(10, 10))
plt.subplot(221)
clustering.ImagenMascara(mascara_3)
plt.title('Features + radiances 3')
plt.subplot(222)  
clustering.ImagenMascara(mascara_5)
plt.title('Features + radiances 5')
plt.subplot(223)  
clustering.ImagenMascara(mascara_3_features)
plt.title('Features 3')
plt.subplot(224)  
clustering.ImagenMascara(mascara_5_features)
plt.title('Features 5')
plt.subplots_adjust(wspace=0.04, hspace=0,left=0, bottom=0, right=1, top=1)
plt.show()
plt.close()

Compairson with fmask mask

In [47]:
# Read gdal band
f_mask_band = info["bands"]["fmask"] + 1 # gdal is 1-based array!!!
d2_array = ds.GetRasterBand(f_mask_band).ReadAsArray()
d2_array[np.isnan(d2_array)] = -1
d2_array = d2_array.astype(int)

# Get only clouds and shadows https://code.earthengine.google.com/dataset/LANDSAT/LC8_L1T_TOA_FMASK
mascara_fmask = np.where(d2_array == 4,2,0)
mascara_fmask = np.where(d2_array == 2,1,mascara_fmask)
mascara_fmask = np.where(d2_array == -1,-1,mascara_fmask)

plt.figure(figsize=(12, 8))

plt.subplot(131)
clustering.ImagenMascara(mascara_fmask)
plt.title('fmask')

plt.subplot(132)
raster_original = np.clip(radiancia_original[:,:,[3,2,1]], 0, .3) /.3
plt.imshow(raster_original)
plt.axis('off')
plt.title("Original")

plt.subplot(133)
clustering.ImagenMascara(mascara_3)
plt.title('Features + radiances 3')
plt.subplots_adjust(wspace=0.04, hspace=0,left=0, bottom=0, right=1, top=1)

plt.show()
plt.close()